Smoothing issues in the structured language model

نویسندگان

  • Woosung Kim
  • Sanjeev Khudanpur
  • Jun Wu
چکیده

The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques – discounting methods and hierarchical structures for back-off – are still being developed for the SLM. In the SLM, the statistical dependencies of a word on immediately preceding words, preceding syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to combine these N-gram like models. We demonstrate on two different corpora – WSJ and Switchboard – that more recent modified back-off strategies and nonlinear interpolation methods considerably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model

Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...

متن کامل

Iranian EFL Teachers’ Cultural Identity in the Course of their Profession

Grounded on Hofstede's (1986) dichotomous model of collectivism/individualism, this study explored Iranian English as a foreign language (EFL) teachers' cultural identity. A sequential mixed methods procedure was adopted to examine their cultural orientation and the impact of length of experience on their degree of propensity to absorb the target language culture. A total of 120 female and male...

متن کامل

Smoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling

In a previous work, a new probabilistic context-free grammar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabilities according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. Howeve...

متن کامل

Investigating the Relationship between Teaching Styles and Emotional Intelligence among Iranian English Instructors

This study investigated the relationship between five teaching styles and emotional intelligence among 102 Iranian English instructors from different universities in Tehran, Iran. To this end, the data were obtained through two phases of quantitative and qualitative data collection. To achieve quantitative data, the participants were asked to fill in two questionnaires, including the Teaching S...

متن کامل

Estimating structural relevance of XML elements through language model

Language modeling approaches have been extensively used as an effective way of measuring ad-hoc document content relevance. However, in structured information retrieval (SIR) there is to our knowledge no approach which aims at assessing structural relevance using language models. In this paper we present a language model based on document-query structure likelihood. As the effectiveness of lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001